deep learning versus kernel
Review for NeurIPS paper: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Additional Feedback: Minor issues *Visualization method of Figure 1: I am not sure how the authors depict this paper. Is it based on PCA of trajectories? It is also unclear why linear lines give these trajectories. It is just a linear regression with the Taylorized model (2). More technically speaking, when we use data-dependent NTK in a linearized model, the positive definiteness of this NTK is non-trivial and the equivalence to the kernel regression becomes unclear.
Review for NeurIPS paper: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
The reviews for this paper were overall positive. The paper presents a empirical inquiry of the landscape of the loss of the data-dependent neural tangent kernel. The authors examine dynamics of the kernel, the loss landscape, comparing the learned kernel to corresponding neural networks. The reviewers appreciated that the evaluation of the so-called'parent-child spawning' phenomena and the approximation accuracy of data-dependent neural tangent kernels. The authors made a laudable effort to put to question common preconceptions about neural tangent kernels through an extensive set of numerical experiments leading to interesting empirical observations.
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process.